Practical Glossing by Prioritised Tiling

نویسندگان

  • Victor Poznanski
  • Pete Whitelock
چکیده

We present the design of a practical context-sensitive glosser, incorporating current techniques for lightweight linguistic analysis based on large-scale lexical resources. We outline a general model for ranking the possible translations of the words and expressions that make up a text. This information can be used by a simple resource-bounded algorithm, of complexity O(n log n) in sentence length, that determines a consistent gloss of best translations. We then describe how the results of the general ranking model may be approximated using a simple heuristic prioritisation scheme. Finally we present a preliminary evaluation of the glosser's performance. 1 I n t r o d u c t i o n In a lexicalist MT framework such as Shakeand-Bake (Whitelock, 1994), translation equivalence is defined between collections of (suitably constrained) lexical material in the two languages. Such an approach has been shown to be effective in the description of many types of complex bilingual equivalence. However, the complexity of the associated parsing and generation phases leaves a system of this type some way from commercial exploitation. The parsing phase that is needed to establish adequate constraints on the words is of cubic complexity, while the most general generation algorithm, needed to order the words in the target text, is O(n 4) (Poznanski et al. 1996). In this paper, we show how a novel application domain, glossing, can be explored within such a framework, by omitting generation entirely and replacing syntactic parsing by a simple combination of morphological analysis and tagging. The poverty of constraints established in this way, and the consequent inaccuracy in translation, is mitigated by providing a menu of alternatives for each gloss. The gloss is automatically updated in the light of user choices. While the availability of alternatives is generally desirable in automatic translation, it is the limitation to glossing which makes it feasible to manage the consistency maintenance required. Glossing as a technique lbr elucidating the grammar and lexis of a second language text is well-known from the linguistics literature. Each morpheme in the object language is provided with its meta-language equivalent aligned beneath it. Such a glosser may be used as a tool Ibr second-language improvement (Nerbonne and Smit, 1996), and thus provide an educational alternative to the passive consumption of a (usually low quality) translation. We envisage the glosser's primary use as a tool Ibr cross-language inlbrmation gathering, and thus think it best not to display grammatical information. Our glosser improves on the use of printed or even on-line dictionaries in several ways: • The system performs lemmatisation for the user. • Lightweight analysis resolves part-ofspeech ambiguities in context. • Multi-word expressions, including discontinuous and variable ones, are detected. • A degree of consistency between system and user choices is maintained.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Practical Glossing by Prioritised Tining

We present the design of a practical context-sensitive glosser, incorporating current techniques for lightweight linguistic analysis based on large-scale lexical resources. We outline a general model for ranking the possible translations of the words and expressions that make up a text. This information can be used by a simple resource-bounded algorithm, of complexity O(n log n) in sentence len...

متن کامل

The Sharp Intelligent Dictionary

This paper describes the Sharp Intelligent Dictionary (SID), an English-Japanese glossing system for Japanese readers and learners of English. SID uses a variety of lightweight analysis techniques, a large bilingual dictionary and a prioritised model of collocations to present informed guesses about the best translations of words and expressions in their context.

متن کامل

The Effect of Visual Representation, Textual Representation, and Glossing on Second Language Vocabulary Learning

In this study, the researcher chose three different vocabulary techniques (Visual Representation, Textual Enhancement, and Glossing) and compared them with traditional method of teaching vocabulary. 80 advanced EFL Learners were assigned as four intact groups (three experimental and one control group) through using a proficiency test and a vocabulary test as a pre-test. In the visual group, stu...

متن کامل

The Ubiquity of the Gloss

This paper argues that glossing is an essential stage in the borrowing of writing systems. I use the term “glossing” in a somewhat extended sense to refer to a process where a text in one language is prepared (annotated, marked) to be read in another. I argue that this process of “vernacular reading” – reading a text written in the script, orthography, lexicon and grammar of a more prestigious ...

متن کامل

05John Whitman_OK.indd

This paper argues that glossing is an essential stage in the borrowing of writing systems. I use the term “glossing” in a somewhat extended sense to refer to a process where a text in one language is prepared (annotated, marked) to be read in another. I argue that this process of “vernacular reading” – reading a text written in the script, orthography, lexicon and grammar of a more prestigious ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002